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Abstract. One of the most important steps of document image process- 
ing is binarization. The computational requirements of locally adaptive 
binarization techniques make them unsuitable for devices with limited 
computing facilities. In this paper, we have presented a computation- 
ally efficient implementation of convolution based locally adaptive bina- 
rization techniques keeping the performance comparable to the original 
implementation. The computational complexity has been reduced from 
0(W 2 N 2 ) to 0(WN 2 ) where W x W is the window size and N x TV 
is the image size. Experiments over benchmark datasets show that the 
computation time has been reduced by 5 to 15 times depending on the 
window size while memory consumption remains the same with respect 
to the state-of-the-art algorithmic implementation. 

Key words: Binarization, Computational Complexity, Mobile Device 

1 Introduction 

Document image binarization is an extensively studied topic over the past decades 
It is one of the most important steps of any document processing systems. It 
can be defined as a process of converting a multi-chromatic digital image into a 
bi-chromatic one. A multi-chromatic image also called as color image consists of 
color pixels each of which is represented by a combination of three basic color 
components viz. red (r), green (g) and blue (6). The range of values for all these 
color components is 0-255. So, the corresponding gray scale value f(x,y) for a 
pixel located at (x, y) may be obtained by using Eq. [I] 

/(x, y) =w r x r(x, y) + w g x g(x, y) + w b x b(x, y) (1) 

where w r — 0.299, w g — 0.587 and Wb= 0.114. As Wi =1, the range of f(x,y) 
is also 0-255. So, a gray scale image can be represented as a matrix of gray level 
intensities Fmxn = [f(x,y)]MxN where M and N denote the number of rows 
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i.e. the height of the image and the number of the columns i.e. the width of the 
image respectively. Similarly, a binarized image Gmxn can be represented as 
[g(x,y)]MxN such that g(x,y) £ {0, 255}. 

Techniques developed so far for document image binarization are categorized 
into two types - global binarization techniques and locally adaptive binarization 
techniques. In the first case, pixels constituting the image are binarized with a 
single threshold T as shown in Eq. [5] A number of such techniques p]-[I] have 
been developed, of which Otsu's technique [1] has been found to be the best in 
a study conducted by Trier et al. [S]-[5]. 

9( X ,y) = 1° <T (2) 

I 255 Otherwise 

Global binarization techniques in general produce good results for noise free 
and homogeneous document images of good quality. But, it fails to properly 
binarize the images with uneven illumination and noise. Locally adaptive bina- 
rization techniques evolved to overcome this problem by binarizing pixels with 
pixel specific threshold t(x,y) as shown in Eq. [3] 

/ \ JO if /O, y) < t(x, y) 
g(x,y) = < . (3) 

I 255 Otherwise 

Quite a good number of such adaptive techniques have been found in the 
literature [7]-[T3]. Among these techniques, the best one has been found to be 
Niblack's one [TT] in the same study by Trier et al. [S]-[B]. Later, more advanced 
techniques have been designed and some of them have been reported in [T5]-[T§]. 
Sauvola's [15] text binarization method (TBM) is one of them. This method 
calculates t(x,y) from mean m(x,y) and standard deviation s(x,y) of the gray 
levels of the pixels within a window around the subject pixel (x, y) as described 
in Eq. gj 

t(x,y)=m(x,y)[l + k( S ^--l)} (4) 

where A: is a positive constant and R is the dynamic range of standard devia- 
tion. A good number of the locally adaptive binarization techniques including 
Sauvola |15j are convolution based. As a result, the computational complexity 
and computation time of such techniques are very high. So far, binarization tech- 
niques are evaluated on the basis of binarization accuracy only [2U]-[1I]- But, 
study on computational requirements of algorithms is also required especially 
for real time systems and low-resourceful computing devices such as cell-phones, 
Personal Digital Assistants (PDA), iPhones, iPod- Touch, etc. 

The present work is an attempt to reduce the computational complexity 
of convolution based binarization techniques while retaining comparable accu- 
racies. The computational complexity of the global binarization technique is 
usually 0(N 2 ) where N x N is the image size. In case of convolution based bi- 
narization techniques, selection of threshold for each pixel requires computation 
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of mean and standard deviation of the gray level intensities of the surrounding 
pixels within the window. So, computation of individual threshold value for each 
pixel has a complexity of 0(W 2 ) where W x W is the window size and overall 
complexity is 0(W 2 N 2 ) for an image of N x N pixels. 

As the study [22] shows, handheld/mobile devices may not be capable of 
running algorithms of 0(W 2 N 2 ) within affordable time. Even, the time taken 
for such algorithms on desktop computers may lead to dissatisfaction to many. 
In [2"S], Shafait et al. have suggested an implementation for such algorithm 
with computational complexity of 0(N 2 ). They proposed a faster calculation 
of m(x,y) and s(x,y) using integral images, but at the cost of 5 to 6 times 
more memory. In the present work, we have proposed a novel implementation of 
the convolution based binarization algorithms, which has a computational com- 
plexity of 0(WN 2 ) and does not require any additional memory. Experimental 
results on publicly available standard datasets and our own dataset have also 
been presented. 

2 Present Work 

The computation of mean m(x,y) and standard deviation s(x,y) for each pixel 
(x,y) is the most time consuming operation in convolution based locally adaptive 
binarization techniques. If a window of size W x W pixels is taken around a pixel 
(x,y), then the set of window pixels, S, will have W x W number of elements. 
The performance of such methods is heavily dependent on the size of the window. 
Window size is decided on the basis of pattern stroke and pattern size. It cannot 
be made arbitrarily small. 

A possible way to reduce the execution time of such binarization methods 
is to reduce the number of pixels in S by considering only the pixels which 
effectively contribute in computation of mean and variance within the window. 
In our present work, we have tried to reduce this number by sampling pixels 
from S following some geometrical order to form a reduced set S' C S. 

Different geometric structures can be defined to select the contributive pixels. 
A few of such geometric structures have been shown in Fig. [I] S' contains pixels 
corresponding to black boxes marked in the geometric structures of Fig. [T] It may 
be observed that in S', the number of foreground pixels is much lesser than that 
of the background pixels for the windows of same size around both foreground 
and background pixels. The mean and standard deviation computed from S' are 
denoted as m'(x,y) and s'(x,y) respectively. 

It is evident that S' is a very small subset of S for all possible geometric 
structures of Fig. [T] Also, in this context, the formulation of Sauvola's method 
is as given in Eq. [5] 

t'(x, y )=m'(x,y)[l + k( S ^± -1)] (5) 

where t'(x,y) denotes the threshold calculated from the reduced set S' for the 
pixel (x,y), R' is the dynamic range of s'(x,y) and k is a positive constant. 



4 Ayatullah Faruk Mollah 1 , Subhadip Basu 2 , and Mita Nasipuri 2 



(a) GS-1 



(b) GS-2 



(c) GS-3 
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(d) GS-4 



(e) GS-5 



(f) GS-6 



Fig. 1. Various geometric structures for selection of representative pixels in a 
window 



3 Experimental Results 



The proposed implementation has been tested on the printed as well as hand- 
written images used for benchmarking the performance of various binarization 
techniques in the recent Document Image Binarization Contest (DIBCO) 2009 
[24] and Handwritten Document Image Binarization Contest (H-DIBCO) 2010 
[2"5] . It has also been tested on our own dataset (CMATERdb-6) as well. This 
dataset contains 5 representative images. The first one is of a handwritten Ben- 
gali document in which the texts on the rear side are visible from the front side 
and the image is unevenly illuminated. The second image is of an old historical 
printed Bengali document. The third one is an image of printed English text 
and has been captured from a notice board by a cell-phone camera. The fourth 
image is of an old printed document having texts of multiple fonts and font-sizes. 
The last one is a cell-phone camera captured business card image. 

Ground truth data for the DIBCO and H-DIBCO datasets are publicly avail- 
able. We have prepared the ground truth data for the images of CMATERdb-6 
dataset. The original and ground truth images of this dataset are publicly avail- 
able for research purposes at [26j- These three datasets together have 25 repre- 
sentative images containing various kinds of degradations and deformations. The 
results obtained with the proposed implementation have been compared with the 
original/algorithmic implementation of Sauvola's binarization method, since it 
is one of the best convolution-based binarization methods. Results obtained with 
Niblack as well as Otsu's binarization technique have also been given. 
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3.1 Performance Analysis 

Current implementation incepts from a conjecture that the threshold value 
if (x,y) obtained from m! (x,y) and s/ ' (x,y) are not considerably different from 
t(x,y) computed from m(x,y) and s(x,y). As a result, the performance remains 
comparable. The binarized result obtained with the new threshold t'(x,y) may 
not be exactly same with that obtained with t(x,y), but experiments show that 
the results obtained with the presented technique serves the purpose of binariza- 
tion very well. 

Comparing the output images obtained using various geometric structure of 
Fig[TJa-f) with their ground truth images, we find the number of true positives 
(TP), number of true negatives (TN), number of false positives (FP) and the 
number of false negatives (FN). The definition for F-Measure (FM) in terms of 
Recall rate (R) and precision rate (P) has been given in Eq. [6j 

2xRxP 

= -^TP- (6) 

where R = T p+ FN and P = T J^ Fp . In an ideal situation i.e. when the output 
image is identical with the ground truth image, R, P and FM should be all 
100%. While calculating the F-Measure, the best combination of window size 
(W) and k has been considered in all cases. 

Table [l] shows F-Measures achieved with Otsu, Niblack, Sauvola and pro- 
posed implementations of Sauvola's method for DIBCO image dataset. It con- 
tains 5 (1-5) printed and 5 (6-10) handwritten images. Bold cells represent the 
highest F-Measure achieved for the corresponding image. It may be noted that 
the highest mean F-Measure (91.13%) has been achieved with GS-3. Moreover, 
mean F-Measures achieved with GS-4, GS-5 and GS-6 are greater than that of 
Sauvola's method. F-Measure with GS-2 is equal to that of Sauvola. 

Similar to Table [T] Table [2] shows F-Measures achieved with H-DIBCO im- 
ages. It contains 10 representative handwritten document images. Proposed im- 
plementations have achieved highest F-Measures for 6 images out of 10. The 
implementation referred to as GS-3 alone has yielded 3 highest F-Measures. Al- 
though, the mean F-Measure is highest in case of Sauvola, F-Measures of the 
proposed implementations are close to that. 

Table [3] shows the F-Measures achieved with CMATERdb-6 images. It is 
noteworthy that highest mean F-Measure i.e. 92.48% has been achieved for GS- 
6 whereas the mean F-Measure for Sauvola's method is 92.42%. It may also be 
noted that Otsu's global binarization method has given the highest F-Measure 
for the fourth image. 

A comparison of the mean F-Measures achieved for all 25 images of the 
3 datasets with various techniques has been shown in Fig. [2] The highest F- 
Measure i.e. 90.31% is achieved with GS-3. F-Measures achieved with all present 
implementations are greater than that of Niblack. Three implementations viz. 
GS-1, GS-2 and GS-4 have yielded F-Measures slightly less than that of Sauvola 
and the remaining implementations viz. GS-3, GS-5 and GS-6 have yielded 
slightly improved F-Measures than that of Sauvola. This shows that the results 
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Table 1. F- Measures achieved with different techniques/implementations for 
DIBCO images (Bold cells represent the highest F-Measure for the correspond- 
ing image) 



Image 


F-Measures (%) 


Otsu 


Niblack 


Sauvola 


GS-1 


GS-2 


GS-3 


GS-4 


GS-5 


GS-6 


1 


91.06 


88.12 


91.64 


91.18 


91.83 


91.88 


91.61 


91.91 


91.71 


2 


96.56 


94.76 


96.39 


95.58 


96.27 


96.16 


96.14 


96.40 


96.25 


3 


96.71 


88.65 


95.82 


94.90 


95.83 


95.58 


95.68 


95.96 


95.77 


4 


82.59 


90.29 


92.93 


91.70 


93.02 


92.88 


92.56 


92.93 


92.86 


5 


89.58 


85.59 


89.81 


88.49 


89.85 


89.36 


89.33 


89.87 


89.62 


6 


90.85 


92.70 


92.34 


91.64 


91.96 


92.15 


91.77 


91.78 


91.98 


7 


86.15 


75.02 


86.65 


89.64 


88.99 


89.41 


89.21 


89.06 


89.23 


8 


84.11 


88.19 


87.99 


88.08 


88.02 


89.08 


88.74 


88.46 


88.92 


9 


40.56 


86.20 


88.62 


88.09 


88.65 


89.24 


88.52 


88.79 


88.99 


10 


28.04 


85.62 


85.38 


86.36 


83.13 


85.59 


85.13 


85.05 


84.81 


Mean 


78.62 


87.51 


90.76 


90.57 


90.76 


91.13 


90.89 


91.02 


91.01 



Table 2. F-Mcasures achieved with different techniques/implementations for 
H-DIBCO images (Bold cells represent the highest F-Measure for the corre- 
sponding image) 



Image 


F-Mcasures (%) 


Otsu 


Niblack 


Sauvola 


GS-1 


GS-2 


GS-3 


GS-4 


GS-5 


GS-6 


1 


91.47 


90.98 


91.23 


89.39 


90.82 


89.71 


90.28 


91.37 


90.57 


2 


88.18 


88.46 


89.03 


89.86 


88.32 


89.47 


89.53 


88.64 


89.26 


3 


84.36 


81.78 


85.64 


84.16 


85.01 


84.36 


84.45 


85.15 


84.63 


4 


85.62 


89.80 


89.67 


89.29 


89.82 


89.84 


89.45 


89.60 


89.69 


5 


88.28 


84.57 


92.26 


92.91 


93.20 


93.51 


93.19 


93.23 


93.38 


6 


80.38 


84.38 


84.09 


84.04 


83.77 


84.11 


84.42 


84.33 


84.33 


7 


90.12 


89.57 


90.87 


90.76 


90.69 


91.13 


90.84 


90.79 


91.09 


8 


85.68 


88.32 


88.23 


87.27 


88.01 


88.29 


88.30 


88.12 


88.32 


9 


81.28 


88.43 


88.42 


88.40 


87.88 


87.88 


87.85 


87.92 


87.92 


10 


79.25 


87.67 


87.60 


87.90 


86.00 


85.91 


85.54 


85.67 


85.66 


Mean 


85.46 


87.40 


88.70 


88.40 


88.35 


88.42 


88.39 


88.48 


88.49 



Table 3. F-Measures achieved with different techniques/implementations for 
CMATERdb6 images (Bold cells represent the highest F-Measure for the cor- 
responding image) 



Image 


F-Measures (%) 


Otsu 


Niblack 


Sauvola 


GS-1 


GS-2 


GS-3 


GS-4 


GS-5 


GS-6 


1 


88.00 


89.71 


89.98 


90.10 


90.09 


90.16 


90.05 


89.96 


90.14 


2 


88.88 


88.68 


89.04 


88.97 


89.07 


89.05 


89.06 


89.07 


89.07 


3 


91.93 


92.89 


93.41 


93.00 


93.37 


93.46 


93.29 


93.38 


93.46 


4 


99.04 


95.64 


98.02 


97.01 


97.42 


97.90 


97.80 


97.82 


97.94 


5 


91.06 


90.94 


91.65 


91.40 


91.63 


91.70 


91.81 


91.77 


91.80 


Mean 


91.78 


91.57 


92.42 


92.10 


92.32 


92.45 


92.40 


92.40 


92.48 
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0t5u Ni black Sauvola GS-1 GS-2 GS-3 GS-4 GS-S GS-6 



Various Implementations 



Fig. 2. Mean F-Measures computed for all images of the 3 benchmarking 
datasets with various techniques and implementations 
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Fig. 3. Sample images and binarized results with various techniques. (a,e,i) Im- 
age #7 of [21], #2 of [23 and #4 of [26] respectively, (b,f,j) Binarized images 
with Sauvola's method, (c,g,k) Binarized images with GS-3, GS-1 and GS-3 
respectively, (d) Ground truth images 



8 Ayatullah Faruk Mollah 1 , Subhadip Basu 2 , and Mita Nasipuri 2 



with the proposed implementations are comparable with the result of Sauvola. 
Fig. [3] show some sample images of the above datasets and their binarized images 
for some techniques. 

3.2 Computational Complexity and Computation Time 

The proposed technique calculates the threshold for each pixel with computation 
time of 0(W) time. So, computational complexity of the proposed technique is 
0(WN 2 ). Plot of mean computation times of Niblack, Sauvola and proposed 
techniques has been shown in Fig. [4] with respect to a moderately powerful 
notebook (DualCore T2370, 1.73 GHz, 1GB RAM, 1MB L2 Cache). It may be 
observed from Fig. [4] that the computation time of the proposed technique is 
much lesser than Niblack's and Sauvola's implementations. 




Nib. San. GS-1 GS-2 GS-3 GS-4 GS-S GS-6 
Various Implementations 



Fig. 4. Plot of mean computation times of Niblack, Sauvola and proposed tech- 
niques (for the images of resolution 1024x768 with 20x20 window size) 



3.3 Memory Consumption 

As storing a pixel of a gray scale image requires 1 byte of memory, an N x N 
image requires S z — N x N bytes of memory. The algorithmic implementation of 
Niblack's and Sauvola's technique makes a copy of the image before binarizing 
its pixels by convolving the window. So, the amount of memory consumption of 
this algorithm is 2 x S z + c\ bytes where c\ is a constant. 

The implementation proposed by Shafait et al. [S3] is faster, but it requires 
additional memory. It prepares two types of integral images from the given image 
- one for intensity values and the other for square of the intensity values. To store 
these integral images with 32 bit and 64 bit integers respectively, we need 4 x S z 
and 8 x S z bytes of memory. So, the amount of memory consumption in this case 
is 12 x S z + C2 where C2 is another constant. It may be noted that the memory 
consumption is 6 times higher than that of the algorithmic implementation. 
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Memory consumption of our implementation can be given as 2 x S z + C3 
where C3 is another constant. It may be noted that this implementation requires 
no additional memory compared to the original/algorithmic implementation. 

4 Conclusion 

In this paper, we have presented a novel implementation of convolution based lo- 
cally adaptive binarization techniques. Both the computational complexity and 
computation time are significantly reduced while keeping the performance close 
to the ordinary implementation. The computational complexity has been reduced 
from 0(W 2 N 2 ) to 0(WN 2 ) and the time computation has been reduced by 5 to 
15 times depending on the window size. At the same time, memory consumption 
is the same with the original implementation. This type of implementation is es- 
pecially useful in image analysis and document processing systems for real-time 
systems and on handheld mobile devices having limited computational facili- 
ties. As the trend in designing camera based applications on mobile devices has 
recently increased considerably, the presented technique will be highly useful. 
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also thankful to the School of Mobile Computing and Communication (SMCC) 
for providing fellowship to him. 

References 

1. Abutaleb, A.S.: Automatic thresholding of gray-level pictures using two-dimensional 
entropy. Computer Vision, Graphics, and Image Processing, 47, 22-32 (1989) 

2. Kapur, J.N., Sahoo, P.K., Wong, A.K.C.: A new method for gray level picture 
thresholding using the entropy of the histogram. Computer Vision, Graphics, and 
Image Processing, 29, 273-285 (1985) 

3. Kittler, J., Illingworth, J.: Minimum error thresholding. Pattern Recognition, 19(1), 
41-47 (1986) 

4. Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. 
Systems, Man, and Cybernetics, 9(1), 62-66 (1979) 

5. Trier, .D., Taxt, T.: Evaluation of binarization methods for document images. IEEE 
Trans. Pattern Anal. Mach. Intell., 17(3), 312-315 (1995) 

6. Trier, .D., Jain, A.K.: Goal-directed evaluation of binarization methods. IEEE 
Trans. Pattern Anal. Mach. Intell., 17(12), 1191-1201 (1995) 

7. Bernsen, J.: Dynamic thresholding of grey-level images. In: Eighth Int'l Conf. on 
Pattern Recognition, pp. 1251-1255, Paris (1986) 

8. Nakagawa, Y., Rosenfeld, A.: Some experiments on variable thresholding. Pattern 
Recognition, 11(3), 191-204 (1979) 

9. Eikvil, L., Taxt, T., Moen, K.: A fast adaptive method for binarization of document 
images. In: First Int'l Conf. on Document Analysis and Recognition, pp. 435-443, 
Saint-Malo, France (1991) 



10 Ayatullah Faruk Mollah 1 , Subhadip Basu 2 , and Mita Nasipuri 2 



10. Mardia K.V., Hainsworth, T.J.: A spatial thresholding method for image segmenta- 
tion. IEEE Trans. Pattern Analysis and Machine Intelligence, 10(6), 919-927 (1988) 

11. Niblack, W.: An Introduction to Digital Image Processing. Prentice-Hall, Engle- 
wood Cliffs, NJ, pp. 115-116 (1986) 

12. White, J.M., Rohrer, G.D.: Image thresholding for optical character recognition 
and other applications requiring character image extraction. IBM J. Research and 
Development, 27(4), 400-411 (1983) 

13. Parker, J.R.: Gray level thresholding in badly illuminated images. IEEE Trans. 
Pattern Analysis and Machine Intelligence, 13(8), 813-819 (1991) 

14. Trier, .D., Taxt, T.: Improvement of integrated function algorithm for binarization 
of document images. Pattern Recognition Letters, 16(3), 277-283 (1995) 

15. Sauvola, J., Pietikainen, M.: Adaptive document image binarization. Pattern 
Recognition, 33, 225-236 (2000) 

16. Seeger, M., Dance, C: Binarizing camera images for OCR. In: 6th Int'l Conf. on 
Document Analysis and Recognition, pp. 54-58 (2001) 

17. Wolf, C, Jolion, J.M., Chassaing, F.: Text localization, enhancement and binariza- 
tion in multimedia documents. In: Int'l Conf. on Pattern Recognition, pp. 1037-1040 
(2002) 

18. Gatos, B., Pratikakis, I., Perantonis, S.J.: Adaptive degraded document image 
binarization. Pattern Recognition, 39(3), 317-327 (2006) 

19. Shin, K.T., Jang, I.H., Kim, N.C.: Block adaptive binarization of ill-conditioned 
business card images acquired in a PDA using a modified quadratic filter. IET Image 
Processing, 1(1), 56-66 (2007) 

20. Gatos, B., Ntirogiannis, K., Pratikakis, I.: ICDAR 2009 Document Image Bina- 
rization Contest (DIBCO 2009). In: 10th Int'l Conf. on Document Analysis and 
Recognition, pp. 1375-1382, Spain (2009) 

21. Pratikakis, I., Gatos, B., Ntirogiannis, K.: H-DIBCO 2010 - Handwritten Docu- 
ment Image Binarization Competition. In: 12th Int'l Conf. on Frontiers in Hand- 
writing Recognition, pp. 727-732, India (2010) 

22. Dunlop, M.D., Brewster, S.A.: The Challenge of Mobile Devices for Human Com- 
puter Interaction. Personal and Ubiquitous Computing, 6(4), 235-236 (2002) 

23. Shafait, F., Keysers, D., Breuel, T.M.: Efficient implementation of local adaptive 
thresholding techniques using integral images. In: Document Recognition and Re- 
trieval XV, San Jose, USA (2008) 

24. DIBCO 2009 Benchmarking Dataset, http://users.iit.demokritos.gr/~bgat/ 
DIBC02009/benchmark 

25. H-DIBCO 2010 Benchmarking Dataset, http://users.iit.demokritos.gr/ 
|-bgat/H 1 DIBCD2010/benchmarkl 

26. CMATER Database Repository, http : / /code . google . com/p/cmaterdb 



